Generalizations of the clustering coefficient to weighted complex networks 
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The recent high level of interest in weighted complex networks gives rise to a need to develop new 
measures and to generalize existing ones to take the weights of links into account. Here we focus 
on various generalizations of the clustering coefficient, which is one of the central characteristics in 
the complex network theory. We present a comparative study of the several suggestions introduced 
in the literature, and point out their advantages and limitations. The concepts are illustrated by 
simple examples as well as by empirical data of the world trade and weighted coauthorship networks. 
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The study of networks has become a central topic in 
the science of complex systems [H, [1: 01 • the network 
approach, interacting elements are depicted as vertices 
in the network and their interactions as edges connecting 
the vertices. The inherent strength of this approach lies 
in its ability to capture some of the essential character- 
istics of interacting systems by disregarding the detailed 
nature of both the constituents and the interactions be- 
tween them. Studies on structural properties of complex 
networks have revealed features common to a large num- 
ber of natural and man-made systems, such as short aver- 
age path lengths, broad degree distributions, modularity, 
and high level of clustering. 

Recently, it has become increasingly clear that in or- 
der to understand better the properties of the system, 
it is necessary to take into account some of its hitherto 
omitted details. In particular, understanding the het- 
erogeneity of interaction strenghts and their correlations 
with network topology is fundamental in studies of sev- 
eral types of networked systems, e.g. social and traffic 
networks. This heterogeneity can be taken into account 
by assigning weights to the network edges to quantify, 
e.g. fluxes in traffic-related networks (air traffic, In- 
ternet), strengths of social ties [1], correlations between 
stock returns Q, and trade volumes between countries. 

Incorporating this additional degree of freedom in the 
complex networks framework calls for entirely novel mea- 
sures as well as generalizations of the existing ones. Some 
of these measures are readily generalizable, e.g. the ver- 
tex degree fci, denoting the number of edges connected to 
vertex i. For this the natural weighted counterpart is the 
vertex strength Si = X^jGi/ &' '^'^^re Vi denotes the 
neighbourhood of i and Wij are the weights of edges ema- 
nating from i Q . Unfortunately, not all existing network 
characteristics can be generalized in such a straightfor- 
ward manner. Here we will focus on the several alterna- 
tive definitions proposed in the recent literature for the 
weighted clustering coefficient. 



A large number of networks show a tendency for link 
formation between neighbouring vertices, i.e. the net- 
work topology deviates from uncorrelated random net- 
works in which triangles are sparse. This tendency is 
called clustering and it reflects the clustering of 

edges into tightly connected neighbourhoods. Its origins 
can be traced back to sociology, where similar concepts 
have been used [13, [HI - in a typical social network, the 
friends of a person are very likely to know each other. 
The clustering around a vertex i is quantified by the (un- 
weighted) clustering coefficient Ci, defined as the number 
of triangles in which vertex i participates normalized by 
the maximum possible number of such triangles: 
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where ti denotes the number of triangles around i. Hence 
Ci — if none of the neighbours of a vertex are con- 
nected, and Ci — 1 a all of the neighbours are connected. 
In network analysis this quantity can then be averaged 
over the entire network or by vertex degree. 

By extending the above line of reasoning, the weighted 
clustering coefficient should also take into account how 
much weight is present in the neighbourhood of the ver- 
tex, compared to some limiting case. Evidently, this can 
be done in several ways, and in what follows we focus on 
four existing definitions. In all these formulas, wa = V 
i, i.e., self-edges are not allowed, and j, k G Vi. 

-Barrat et al. were the first to propose a weighted 
version of the clustering coefficient [4| • Their definition 
reads as follows: 
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where = 1 if there is an edge between i and j, and 
otherwise. Noting that Si ~ ki {si/ki) = ki {wi), this 
may also be written as 
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where (wi) = Wij /ki. As the rewritten form shows 
clearly, the contribution of each triangle is weighted by a 
ratio of the average weight of the two adjacent edges of 
the triangle to the average weight (wj). 

- Onnela et al. proposed a version [12i | of weighted clus- 
tering coefficient based on the concept of subgraph inten- 
sity^ defined as the geometric average of subgraph edge 
weights, resulting in: 
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Here the edge weights are normalized by the maximum 
weight in the network, liij — Wij /max(w) and the contri- 
bution of each triangle depends on all of its edge weights. 
Thus triangles in which each edge weight equals max{w) 
contribute unity to the sum, while a triangle having one 
link with a negligible weight will have a negligible con- 
tribution. This definition can be rewritten as 
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Motivation 


Cb 


Reflects how much of vertex strength is associ- 
ated with adjacent triangle edges 


Co 


Reflects how large triangle weights are compared 
to network maximum 


Cz 


Purely weight-based; insensitive to additive noise 
which may result in appearance of "false posi- 
tive" edges with small weights 


Ch 


Similar to Cz, based only on edge weights 



Feature 


Cb 


Co 


Cz 


Ch 


1) C = C when weights become binary 


X 


X 


X 




2) C G [0, 1] 


X 


X 


X 




3) Uses global max{w) in 
normalization 




X 


X 


X 


4) Takes into account weights of all 
edges in triangles 




X 




X 


5) Invariant to weight permutation for 
one triangle 




X 






6) Takes into account weights of edges 
not participating in any triangle 


X 




X 


X 



TABLE I: Motivation and comparison of selected features for 
different weighted clustering coefficients. 



where Ci is the unweighted clustering coefficient and li 
denotes the average (normalized) intensity of triangles in 
which vertex i participates. 

-Zhang et al. have defined, in the context of gene co- 
expression networks the weighted clustering coeffi- 
cient as 
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where the weights have again been normalized by 
max{w) as above. The logic behind this definition is the 
following: the number of triangles around vertex i can 
be written in terms of the adjacency matrix elements as 
ti = ^'^j kO'ijO'jkO'ik, and the numerator of Eq. ([5]) is 
simply a weighted generalization of this formula. The 
denominator has been chosen by considering the upper 
bound of the numerator, ensuring Ci_z G [O;!]- This 
formula can also be written as uM 
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A similar definition has also been presented in Refs. [Ta, 
ITg} . where the edge weights are interpreted as probabil- 
ities such that in an ensemble of networks, i and j are 
connected with probability Wij . 

-Holme et al. have defined the weighted clustering co- 
efficient as [131 
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where W denotes the weight matrix, and 'Wmax a matrix 
where each entry equals max{w). The lines of reasoning 



look similar to those of Rcf. [14]; however, j ^ k is not 
required in the denominator sum. 

Table U presents the selected features of the four 
weighted clustering coefficients and illustrates their dif- 
ferences. These features are discussed in detail below. In 
Table Hand in what follows, C denotes the weighted clus- 
tering coefficient and C the corresponding unweighted 
coefficient, with properties summarized below: 

1. C = C when weights are binary, i.e., Wij = 1 if 
i and j are connected. This condition is fulfilled 
by all weighted clustering coefficients except that 
of Holme et al. When the weights are set to binary, 
Ci^H = "^ti/kf, which approaches the unweighted 
coefficient only when A; 3> 1. 

2. C G [0, 1]. This is true for all weighted coefficients 
except Ci_H, which never reaches unity for the rea- 
son mentioned above. Let us consider the limiting 
values {Ci = 0, = 1) in more detail. For all co- 
efficients, Ci — signifies the absence of triangles. 
A necessary condition for Ci^B = Ij C'i.o — 1, and 
Ci^z = 1 is that edges exist between all neighbours 
of vertex i. However, each coefficient sets a differ- 
ent requirement for the weights. When C — 1, then 
Ci^B = 1 irrespective of the edge weights. Con- 
trary to this, Ci^o = 1 requires that the weights of 
all edges Wij = wjk = max{w), i.e., all weights in 
each triangle are equal to the maximum weight in 
the network. Finally, Ci^z — 1 if each "outer" edge 
Wjk = max{w), irrespective of the weights Wij of 
edges emanating from i. 

3. Global max{w) is used in normalization. This is 
true for all versions except Ci^B, where only the 



3 







A 


A 


A 


1 






Cb 


1 


1 


1 


-1/2 


~0 


1/3 


-1/2 


Co 


~o 


~0 


~0 


1/3 


~0 


~0 


~0 


Cz 


1 


~0 


1 


~1 


~0 


1/3 


-0 



FIG. 1: Values of the weighted clustering coefficients for dif- 
ferent weight configurations when vertex i (solid circle) par- 
ticipates in a single triangle. Solid lines ( — ) depict edges of 
weight w — max(w) — 1, whereas dashed lines ( — ) depict 
edges with vanishingly small weights w = e <^ 1. Note that 
in many cases different weight configurations yield the same 
coefficient values. 

local strength Si matters. This particular choice 
means that within the same network, two vertices 
whose neighbourhood topology and relative weight 
configuration are similar can have the same values 
oi Cb even if all the weights in the neighborhood of 
one vertex are small and those in the neighborhood 
of the other are large. 

4. Weights of all edges of triangles in which i par- 
ticipates are taken into account. This is true for 
Ci^o and Ci^H- However, Ci^B takes into account 
only the weights of the edges connected to i. When 
Cj = 1 and all Wjkj^k are equal, Ci^z — Wjk, i-e. 
it is insensitive to the weights Wij. 

5. Invariance to permutation of weights within a sin- 
gle triangle. This feature is present only in Ci.o, 
showing that it deals with the triangles as an entity. 

6. Weights of edges not participating in triangles are 
taken into account. This is the case for all defi- 
nitions except Ci^o, where such edges only enter 
through the vertex degree k. 

These differences are depicted also in Figure [1] which 
displays the value of the clustering coefficient for a vertex 
participating in one triangle with varying weight config- 
urations, including vanishingly small weights. In Fig[T] 
and in the following, analysis of Ci^n is omitted as it is 
closely related to Ci^z but is normalized in a way which 
can be viewed as incorrect. Next, we compare the behav- 
ior of the different coefficients in two different empirical 
networks. 

-International Trade Network (ITN): The ITN is con- 
structed from trade records between the world's coun- 
tries during the year 2000, such that vertices denote 
countries, edges trade relationships, and edge weights 
trade volumes. The source data [l^] includes the dollar 
volumes of exports and imports between countries but, 
due to different reporting procedures, there are usually 
small differences between exports expij from i to j and 
imports impji to j from i. We have chosen the edge 
weights Wij as a measure of the total trade volume such 
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FIG. 2: Clustering coefficients computed for the international 
trade network (ITN) as function of vertex strength s: Un- 
weighted C(n) and weighted Cb (o), Co {o), and Cz (A). 
Inset: closer view on C and Cb with a linear vertical axis. 

that Wij = i {expij + expji + impij + impji), averaging 
over the aforementioned discrepancies. The network con- 
structed in this manner has N = 187 countries connected 
with E ~ 10252 edges, i.e., it has a relatively high edge 
density of 52 %. High-trade- volume countries typically 
engage in high- volume trade with each other and, thus, in 
the network the high-weight edges are clustered, forming 
a "rich-club" . 

Figured] depicts the different weighted clustering coef- 
ficients as function of vertex strength s. The unweighted 
clustering coefficient C is also displayed for reference. 
Due to the large number of edges, C remains high for 
all s. For low s, Cb follows C very closely, whereas for 
high values of s Cb gets values higher than C, which can 
be attributed to high-trade- volume countries engaging in 
mutual high-volume trade. This effect is far more pro- 
nounced in Co, which displays a power-law like increas- 
ing trend Co{s) oc with (3 « 0.4, spanning several 
decades. This effect is almost purely due to the behavior 
of the average triangle intensity I (see Eq. (g])), as the 
unweighted C changes only a little. Cz is seen to remain 
rather insensitive to the weights. Note that the overall 
level of Co and Cz is much lower than that of C and Cb 
due to weight normalization by the global max(w) and 
to a broad distribution of weights. 

-Scientific Collaboration Network (SCN): The SCN 
is constructed from scientists [l^ who have jointly au- 
thored manuscripts submitted to the condensed mat- 
ter physics e-print archive (http://www.arxiv.org) from 
1995 to 2005. In this network, vertices correspond 
to scientists and edges to co-authorships of papers. 
The edge weights have been defined such that Wij — 
X]p {^i,p^j.p) I ~ 1); where the index p runs over all 
the papers, bi^p — 1 if scientist i is an author of paper 
p and otherwise, and is the number of authors of 
paper p [19]. This network has N — 40422 nodes and an 
average degree of (fc) w 8.7. 
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FIG. 3: Clustering coefficients computed for the scientific col- 
laboration network (SCN) as function of vertex degree k: un- 
weighted C (□), Cb (o), Co (o), and Cz (A). The lower 
panel displays the ratio of the weighted coefficients to the un- 
weighted C; each curve has been linearly scaled between its 
minimum and maximum values to facilitate comparison. 

Figure [3] displays different clustering coefffcients as 
function of degree (upper panel) as well as the ratio of 
the weighted clustering coefficients to the unweighted co- 
efficient (lower panel). Similarly to [4j Csik) remains 
rather close to C(fc) for fc < 10 but for fc > 10 their ra- 
tio is somewhat increased, indicating that the weights of 
edges that do not participate in triangles are relatively 
low and/or the weights of edges participating in several 
triangles are relatively high. In contrast, the shape of 
C'o(fc) differs from C{k) for k < 10. According to Eq. ([4]), 
the ratio Co{k)/C{k) in the lower panel reflects the av- 
erage intensity I{k) of triangles around vertices of degree 
k. The ratio is the largest for low-degree vertices, becom- 



ing approximately constant at fc ~ 10. A possible reason 
for this is that young scientists (e.g. graduate students) 
tend to participate in repeated collaborations involving 
a relatively small number of authors, giving rise to high- 
intensity triangles. Cz{k) appears to capture the low-fc 
behavior of Co as well as the high-fc-behavior of Cb- 

It is clear from the above considerations that there is 
no ultimate formulation for a weighted clustering coeffi- 
cient. Instead, we have seen that the different definitions 
capture different aspects of the problem at hand. For un- 
weighted networks, it is straightforward to measure how 
many edges out of possible ones exist in the neighbor- 
hood of a vertex; yet the questions of how to measure 
the amount of weight located in this neighbourhood and 
what to compare this with, are far from obvious. In a 
sense Cb and Co can be seen as limiting cases: Cb com- 
pares the weights associated with triangles to the average 
weight of edges connected to the focal vertex, while Co 
disregards the strength of the focal node and measures 
triangle weights only in relation to the maximum edge 
weight in the network. Cz can be viewed as an inter- 
polation between these two, albeit being a somewhat un- 
controllable one as is evident from the examples in Fig. [T] 
Given these observations, our conclusion is that there is 
no single general-purpose measure for characterizing clus- 
tering in weighted complex networks. Instead, it might 
be more beneficial to approach the problem from two an- 
gles. While the topological aspect can be described by 
the unweighted clustering coefficient C, the importance 
of the triangles can be quantified using the average tri- 
angle intensities of Eq. 
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